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Summary 

A  critical  procedure  for  use  in  linear  models  is  introduced  and 
developed  in  some  detail,  it  is  based  on  a  nonlinear  analogue  to  the 
usual  linear  least  squares  procedure,  more  specifically,  in  the  inte¬ 
grated  distance  between  character istlc  functions  (densities)  and  their 
sample  counterparts.  Location  and  scale  parameters  are  estimated 
simultaneously.  The  procedure  depends  on  a  user  specified  parameter 
which  may  be  varied  to  determine  the  sensitivity  of  the  parameters  and 
observational  weights  to  such  variation.  A  sensitivity  analysis  of  this 
type  is  useful  in  isolating  potential  problems  with  the  data  or  with  the 
assumed  model.  When  the  procedure  is  employed  with  the  user-oriented 
parameter  held  fixed,  a  robust  procedure  results.  The  statistical  pro¬ 
perties  of  this  procedure  are  discussed  in  some  detail.  A  number  of 
illustrations,  taken  from  the  literature,  are  examined. 

Key  words:  parametric  density  estimation,  sensitivity  analysis,  robust 
estimation  of  location  and  scale,  experimental  designs, 
integrated  distance,  characteristic  functions 


1.  Introduction 


We  introduce  in  this  paper  a  critical,  and  robust,  procedure  for 
the  analysis  of  experimental  design  data.  The  procedure  is  based  on  an 
integrated  residual  distance  between  the  model  characteristic  function  and 
its  sample  counterpart  or,  equivalently,  between  the  assumed  model  density 
and  its  Parzen  kernel  density  estimate.  The  motivation  for  such  a  pro¬ 
cedure  is  developed  in  some  detail  and  a  number  of  examples,  taken  from 
the  literature,  are  examined.  The  procedure  is  useful  in  identifying 
potential  outliers  and  potential  departures  from  assumptions  and  can  be 
adapted  to  arbitrary  types  of  experimental  designs  with  little  difficulty. 

A  number  of  statistical  properties  of  the  procedure  are  discussed. 

The  procedure  determines  the  sensitivity  of  the  parameter  estimates 
and  observation  weights  to  the  change  in  a  single  parameter  X.  Accordingly, 
a  range  of  values  of  this  parameter  X  may  be  utilized.  If  the  structural 
and  error  model  vis-a-vis  the  data  are  internally  consistent,  then  the 
parameter  estimates  and  observational  weights  are  stable  under  changes  of 
X.  If  internal  consistency  is  lacking,  either  because  some  data  are  not 
consistent  with  the  structural-error  model  or  because  the  model  is  not 
descriptive  of  the  data,  the  parameter  estimates  or  the  observational 
weights  wiP  not  be  stable  under  changes  in  X.  The  weights  provide  val¬ 
uable  diagnostic  tools.  The  procedure  also  may  be  used  with  fixed  X  as  a 
robust  method  of  data  analysis. 

The  literature  on  robustness  and  critical  methods  has  by  now  grown 
to  b6  very  extensive  and  a  detailed  review  does  not  seem  appropriate  here. 
Excellent  summaries  and  critiques  are  however  aval  1 able  in  recent  books 
by  Barnett  and  Lewis  (1978),  Huber  (1981),  and  Rey  (1977). 


2.  The  Distance  Between  Densities  Procedure 


In  a  traditional  least  squares  setting  the  jth  response  variable 
Yj  is  related  to  independent  variables  Xjq,  x^,  Xjp,  considered 

fixed,  under  the  assumption  that 

E(yjlxlj,x2j . xpj}  “  klQ  Skxjk  =  xj6’  (2,1a) 

or  in  matrix  notation 

E (y 1 X)  -  X6  (2.1b) 

for  n  mutually  independent  vectors  (y j ,XjQ,Xj 1 , . . .  ,Xjp)  .  The  quantities 
are  to  be  determined.  We  take  as  column  vectors  5  -  (Bq,8 j > . . . ,8p)T 

y  -  (y,»y2»--* »yn)T*  The  design  matrix  is  X  *  Ujk),  j  “  ] >2 . . 

k  *  0,1 . .  and  Xj  =  (Xjg.Xj^ , . . . ,Xjp)  is  a  row  vector  with  XjQ  3  1.  X  is 

of  full  rank  unless  explicitly  stated  otherwise.  For  each  j  we 
also  assume  that  y^.  -  XjB,  given  x . ,  is  normally  distributed  with  mean  0 
and  variance  a ;  in  short  y^  -  Xj8  is  M(0,o  ).  The  parameters  8  are 
determined  by  minimizing  over  the  8's  the  sum  of  squared  residuals 


m  l  (Yj-E(y  |x. 
J  j-1  J  J  - 


))‘ 


(2.2) 


2 

Under  the  Gaussian  (l.e.  normality)  assumption  an  estimate  of  o  is 
provided  by 

o2  -  (n-p-1)"1  ^  Cy j  -  &T  Xj)2  -  n.p~1  a2,  (2.3) 

where  the  vector  8  is  the  value  of  8  which  minimizes  (2.2)  and  a  is  the 

2  2 

maximum  likelihood  estimate  of  a  .  This  estimate  of  o  is  not  provided  by 
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the  least  squares  procedure.  The  underlying  Gaussian  error  structure 
is  not  incorporated  into  (2.2).  It  is  reasonable  to  inquire  if  the 
Gaussian  error  structure  may  be  incorporated  into  the  estimation  process 
in  such  a  way  that  the  least  squares  character  of  (2.2)  is  retained  while 
a  critical  nature  is  imparted  to  the  estimators  of  3  and  a  .  Huber  (1964, 

A 

1981,  Ch.  7)  suggest®  the  replacement  of  the  quadratic  function  ef  in  (2.2) 

by  another  convex  function  p(ej),  say,  which  does  not  increase  as  rapidly 

as  ej.  A  number  of  such  functions  p  have  been  proposed,  see  Rey  (1977). 

Hampel  (197*0  suggests  working  directly  with  score  or  influence  functions 

in  order  to  obtain  robustness  properties.  Our  integrated  distance  between 

densities  or  characteristic  function  approach  has  substantial  points  of 

contact  with  the  work  of  both  Huber  and  Hampel  and  also  with  the  work  of 

Parzen  (1962).  We  have  found  that  the  Gaussian  error  structure  may  be 

conveniently  and  intuitively  appealingly  introduced  into  an  estimation 

process  reminiscent  of  (2.2)  in  the  following  way. 

2 

The  y j ,  given  Xj,  are  independent  N(Xj3,o  )  and  hence  the  character¬ 
istic  function  of  the  y.  given  x.  is 


♦j (u)  «  E (exp ( l uy j | Xj ) )  -  exp(iuXjS  -  ia  u  ) , 


the  density  corresponding  to  (u)  is 


9-4  y-*te  y 

fj  (y)  ■  (2*oZ)  4  exp(-  i  (— J-)2), 


Proceeding  by  analogy,  we  replace  the  yj  in  (2.2)  by  exp(iuyj)  and 
corresponding  to  E(yj|xj)  we  put  E(exp(iuyj | x j ) )  ■  ♦j (u) . 


(2.4a) 


(2.4b) 
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The  analogue  to  the  sum  of  squares  in  (2.2)  is  the  sum  of  moduli  squared 

I  I  r  i  (u)  | 2  -  l  |exp(iuy.)  -  <J> .  (u)  | 2  (2.5) 

j-1  J  j-1  J  J 

wi  th 

2  2 

r j ( u)  -  exp(iuyj)  -  exp(iuXjB  -  i  a  u  )  (2.6) 

representing  a  functional  residual.  Clearly  E(rj(u)|xj)  ■  0  for  all  u. 

The  expression  (2.5)  is  of  little  statistical  use  unless  specific  values 
of  u  are  chosen  for  definiteness.  Close  examination  of  (2.5)  shows  that 
the  u-values  should  be  adaptively  chosen  or  problem  dependent.  For 
example,  large  values  of  XjB  produce  high  frequency  sinusoids  in  <|>j(u) 
and  unless  a  set  of  u  values  is  judiciously  chosen,  extraction  of  infor¬ 
mation  from  (2.5)  will  be  difficult.  The  difficulties  associated  with  the 
appropriate  sampling  rate  of  the  stochastic  process  n (u)  may  be  avoided 
by  adaptively  centering  and  scaling  the  yj  but  we  do  not  pursue  this 
avenue  here.  The  need  for  such  adaptation  was  first  noted  by  Paulson, 
Holcomb,  and  Leitch  (1975)  and  Leitch  and  Paulson  (1975).  Quandt  and 
Ramsey  (1978)  considered  a  moment  generating  function  analogue  of  (2.5) 
but  did  not  address  the  adaptation  question. 

Rather  than  specifying  a  fixed  and  finite  set  of  u-values  in  (2.5), 
we  will  weight  the  | (u) |  by  a  function  |w(u) |  to  be  determined  and 
integrate  out  over  u.  We  shall  also  make  the  weight  function  possess  an 
adaptive  nature.  As  we  shall  see,  there  are  definite  advantages  to  such 
a  course  of  action.  An  immediate  advantage  is  that  the  question  of  the 
rate  at  which  rj(u)  should  be  sampled  need  not  be  considered  because  the 
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integration  corresponds  to  continuous  sampling. 

2 

We  wish  to  estimate  the  parameters  3,  a  from  consideration  of 


J  *  l  [  |exp(iuy.)  -  exp(ix.6u  -  ±  a2 u2|2  |w(u)|2  du.  (2.7) 

j-1  *  J  J 

J  -00 

2 

Unlike  (2.2),  a  may  be  estimated  through  consideration  of  (2.7).  The 

quantity  Jn  represents  a  sum  of  integrated  distances.  Parseval's  theorem 

(Feller,  1966,  Ch.  19,  Heathcote,  1977)  allows  an  expression  of  J  in 

n 

terms  of  densities  if  w(u)  is  chosen  as  a  characteristic  function.  If 
w(u)  is  a  characteristic  function  with  density  fw(y)  then  we  obtain  the 
alternate  expression  in  sums  of  integrated  distance  between  densities, 


J 


n 


(fw(y-yj)  ‘  fw(y)*fj(y))2  dy 


(2.8) 


with  the  density  fj(y)  given  by  (2 . 4b) .  The  symbol  *  denotes  the  operation 
of  convolution  of  f  (y)  with  f.(y).  We  thus  see  that,  apart  from  questions 

W  J 

of  optimality,  a  practically  appealing  choice  for  the  density  fw (y)  is  the 

Gaussian  for  then  the  convolution  f  (y)*f.(y)  is  again  Gaussian.  (The 

w  j 

convolution  of  two  Gaussian  densities  represents  the  density  of  the  sum  of 
two  independent  Gaussian  variables.)  We  thus  take 

w(u)  -  expC-  ±  dy2) ,  (2.9) 


or  equivalently. 


fw(y)  *  (2ird)”*  exp(-  i  ^-) 


(2.10) 


The  choice  of  d  is  open  to  us  and  an  appropriate  choice  will  become 

apparent  as  we  proceed.  We  thus  have 

2  -1  /  (y-X:S)2  . 

f,(y)*fw(y)  ■  (2ir(a+d) )  *  exp(-  i  — s— 1 - )  , 

J  w  '  cj  +d  ' 


(2.11) 
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and  If  we  define  the  j th  residual  in  y  as 


(y-y 


-i  /  \y  y-.i  \  ,  ,  / 

ej  (y)  -  (2ird)  *  exp(-  i  —I — j  -  (2tt(cj  +d))  *  exp(-  i  — r-l - ),  (2.12) 

'  a  +d  ' 

then  (2.8)  becomes  the  sum  of  integrated  residuals 


n  <co 

Jn  *  <2ir)  l  [  e?(y)dy,  (2.13) 

j-i  L 

a  function  of  the  (y.,x.)  and  d.  The  quantity  f  (y-y.)  is  an  unbiased 

J  J  w  J 

Parzen  kernel  density  estimator  for  f.(y)*f  (y) .  The  spirit  of  our  work 

J  w 

is  quite  different  from  that  of  Parzen  (1962).  See,  however,  Heathcote 
(1978). 

The  appeal  of (2.13)  notwithstanding,  it  is  slightly  more  convenient 
to  work  with  the  characteristic  function  version  of  Jn  given  by  (2.7). 

With  w(u)  given  by  (2.9),  (2.7)  may  be  explicitly  integrated  to  give 


J 


n 


(yj-x.B) 

2d+o2 


2ir 

2o2+2d 


(2.14) 


Differentiation  of  (2.14)  with  respect  to  B  yields  the  estimating  equation 

(2.15) 


n  T  /  (yi  -x . 8) 2  , 

sn  "  l  X1  (y i“xl0)  expf-  i  — — j - 1*0; 

n  !-i  J  J  J  \  2d+a  ' 


2 

differentiation  of  (2.14)  with  respect  to  a  gives  the  estimating  equation 


I  [ — 7-372O 

j-1  1 (2d+a2)3/Z  ' 


(yr*-ieV 

2d+a2 


(y,-x,8)‘ 


•)  -(-  i  ^r-) 

'  x  2d+a  ' 


(2d+2o2) 3/2 


l-°. 


(2.16) 
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2  -2 

In  order  to  make  the  estimator  of  o  ,  say  5  ,  scale  Invariant  and  possess 

2 

an  adaptive  nature  we  choose  d  -  \a  .  From  (2.15)  and  (2.16)  we  find  that 
_  -2  2 

the  estimators  6  of  8  and  a  of  a  satisfy  the  Implicit  equation 


6 "  xjVj»]' 


6  -  (XTVXX)-1  XTVxy, 


(2.17a) 


(2.17b) 


l  (yJ  ~  jk  e^J  )xj^> 

p  2 

/  x.,  v., 

L  jk  jX 


k  -  0,1,, 


j/W’VjX 


l\^-(my2\ 

2  1  >T<V>  ~  V(*TVxX)',XTV1)y 


U.  18a) 


1+2X\  3/2 


(2.18b) 


tr(V  - "  (£§) 


where 


T  ^  T 

<  V.X  »  l  x .  x . v . .  , 
x  j-i  J  J  Jx 


V  oZ(l+2X)  ' 


(2.19) 


VX  *  dla9(v1X . vnX}  *  and  tr(Vx)  denotes  the  trace  of  V^. 

Since  the  estimators  §  and  a  are  implicitly  defined  in  the  system 
(2.17)  “  (2.19),  an  Iterative  solution  is  required.  We  have  found  a  fixed 
point  procedure  to  be  attractive  because  of  its  relative  simplicity  and 
ease  of  implementation.  Matrix  inverses  can  be  avoided  in  (2.17c)  and  this 
is  a  real  advantage  in  large  systems.  The  fixed  point  procedure  is  tmple- 
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mented  by  choosing  as  the  initial  guess  the  maximum  likelihood  estimates 
Sq  *  6  =  (XTX)  N^y,  5^  *  *  n  1  yT(I-X(X^X)  ^X^)y.  (This  corresponds 

to  X  =  +  oo.)  These  values  are  substituted  into  the  righfhand  sides  of 
(2.17)  “  (2.19)  and  new  estimates  and  are  determined  on  the  left- 
hand  sides  of  (2.17)  and  (2.19).  The  process  is  repeated  until  the 
absolute  or  relative  difference  between  successive  estimates  reaches  pre- 
assKgned  tolerance  levels.  Convergence  of  this  process  is  not  guaranteed 
for  all  values  of  X  and  more  than  a  single  solution  for  B  and  a  may 
exist  for  a  range  of  values  of  X.  We  have  succeeded  in  constructing  an 
instance  of  multiple  solutions.  Extensive  experience  with  this  procedure 
is  a  wide  variety  of  applications  and  in  simulation  trials  has  failed  to 
uncover  any  difficulty  with  multiple  solutions  or  with  convergence  as 
long  as  X  is  not  chosen  excessively  small  (possibly  negative).  Since  the 
statistical  properties  of  the  estimators  also  depend  on  X,  we  now  investi¬ 
gate  this  dependence. 


3.  Some  Properties  of  the  Estimators 

The  form  of  estimating  equations  (2.15)  and  (2.16)  show  that  the 
-  -2 

estimators  8  and  a  are  M-estimators.  The  dependence  of  the  estimators 
on  X  has  been,  and  will  continue  to  be,  suppressed  for  notational  con¬ 
venience.  It  should  however  be  remembered  that  we  are  considering  a  class 

-  -2  2 
of  estimators.  The  estimators  6  and  a  are  consistent  for  B  and  a  for 

all  X  >  -  i  under  the  assumptions  of  section  2  (Bryant  and  Paulson,  1979). 

-2 

The  estimators  are  also  asymptotically  normally  distributed  with  B  and  o 
asymptotically  independent  (Thornton  and  Paulson,  1977)- 
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The  quantity  Jn  has  been  used  to  produce  score  functions  for  esti- 
2 

mating  3  and  o  .  J  does  not  provide,  except  in  a  special  sense,  a  bona 
fide  objective  function.  Objective  functions  are  easy  to  develop  for 
location  problems  when  the  underlying  parent  is  symmetric.  Score  functions 
are  also  easy  to  produce  for  location  problems  when  the  underlying  parent 
is  symmetric.  Both  are  much  harder  to  develop  for  non-location  problems 
or  non-symmetric  parents. 

The  asymptotic  variances  are  readily  developed  from  the  score 

functions  by  the  standard  expansion  arguments  (Cram6r,  pp.  500-503,  for 

example)  from  which  asymptotic  normality  is  derived.  We  sketch  the 

n 

development  for  3  from  equation  (2.15)  •  i-et  S  *  J  s„.  in  this  equation. 

n  j»l  3J 

If  3q  is  the  true  value  of  3,  we  expand  Sn  around  3q  to  get 


3s.  . 

n 


0  -  l  sg;  *  I  se  i  +  l  - J-  (5"eo)  +  Rn 

j-1  PJ  j=i  B0J  j=1  33 


(3.1) 


where  the  (p+l)xl  vector  R  is  a  remainder  term  and  sQ  .  indicates 

n  b0J 

2 

sQ1  is  evaluated  at  3n.  Set  d  *  \a  .  By  the  multivariate  central 

3j  n  0 


that 
1  imi  t 


theorem  ^  s  .  is  asymptotically  (p+l) 

j-1  ‘V 

0  and  covariance  matrix 


variate  Gaussian  with  mean  vector 


G 


n 


d 


Ao 


which  may  be  explicitly  evaluated  as 


G 

n 


j-1 


T 

Xj  *j 


(3.2) 


after  some  algebraic  reduction. 
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This  truncated  expansion  suggests  that  as  n  increases  (B-6q)  is  becoming 
e  (p+1)  variate  Gaussian  with  mean  vector  J  and  variance-covariance  matrix 


cov(g)  *  h"1  a  h"1 
n  n  n 


2  /4+8X+4X2\  3/2  /%,T„X-1 
'3+8X+4X2' 


(X’X) 


(3.3) 


since 


Hn  “  E 


(  ?  ^r) 

'  j-1  36  ' 


d  -  Xa 


3/2  n 


/J±2X>  7  XTX 

V 2+2x7  ji,  xj  xj 


3/2 


(Mr)  »V- 


(3.4) 


The  arguments  required  to  produce  cov(B)  are  not  given  here.  The  co- 
variance  matrix  of  (3-3)  should  be  inflated  by  a  factor  similar  to 
n/(n-p-l)  to  reduce  the  bias.  We  suggest  that  it  be  inflated  by 

tr(V  ) 

- Y — rr? —  <3.5) 

tr(Vx-VxX(X  V^X)  ‘Xl Vx) 
which  becomes  n/(n-p-l)  when  X»«. 

_  2 

The  asymptotic  variance  of  o  is  also  determined  from  a  truncated 
Taylor's  series  expansion  of  (2.16)  and  some  straightforward  but  tedious 
computations.  We  find 

/1+2X\i  6+8*+4x2  / 1+2x\  3 

var(o2)  -  o4  4  SESt  JHEEZ : A***2  ■  .  (3.6) 

9  /  t  \  2  /1+2X\5 

\1+2X/  V2+2X/ 
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The  asymptotic  efficiency  of  8  relative  to  B  is  obtained  from  (3.3)  as 

.a) 

v  4+8x+4xZ/ 


-2  a  2 

that  of  5  relative  to  o  is  obtained  from  (3-6)  as 


(3-7) 


e(S2) 


2/'i±2i\  5 

9  \H-2A)  \  2+2X! _ 

2  /Jl2X\±  6+8A+4X2  /1+2A\3 

\ 3+2X/  (3+2x) 2  ’  \2+2X/ 


(3.8) 


These  efficiencies  are  provided  in  Table  1.  A  value  of  X*2  produces 

2 

efficiencies  of  about  .96  for  estimating  8  and  about  .94  for  a  .  The 

2 

efficiency  declines  as  X  decreases  from  +».  The  estimators  B  and  a 
of  (2.17)  and  (2.18)  are  still  well  defined  even  when  -  i  <  X  <_  0  but 
observe  that  f  (y-y;)  in  (2.8)  is  defined  as  a  Dirac  delta  function  at 

W  J 

2 

y-yj  when  X-0  and  d  *  Ac  . 


TABLE  1 

Efficiencies  of  the  Estimators  B,  a  for  Selected  Values  of  X 


Estimator 

0 

•  5 

i 

2 

4 

00 

8 

.65 

.84 

.91 

.96 

.99 

1 

q2 

•  54 

00 

oo 

.94 

.98 

1 
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tnfluence  functions  describe  the  behavior  of  an  estimator  as  a 

function  of  a  single  additional  observation.  An  additional  observation 

in  our  framework  is  the  pair  (y,x)  with  x  considered  given.  The 

_  2 

influence  function  for  o  at  the  Gaussian  distribution  with  mean  x6  and 
2  2 

variance  a  ,  i.e.  N(xg,a  )  in  short,  is  determined  from  the  score 
function  obtained  from  (2.16).  The  score  function  is  divided  by 

to  give  the  influence  function. 

d-Ao2 

We  obtain  after  some  algebraic  reduction 


I C(y ,x  ;52,N) 


urn) 5/2  i<y-*«2  -(-*-*££) 


a  (1+2A) 


(3-9) 


This  influence  function  is  bounded  in  the  residual  y-xS,  and  redescends 

to  an  asymptote  greater  than  zero.  Accordingly,  even  pairs  (y,x)  which 

give  rise  to  very  large  residuals  produce  a  contribution  to  the  estimate 
2 

of  a  when  A  <  «. 

The  influence  function  for  §  at  the  Gaussian  distribution  is  a 
little  more  difficult  to  obtain.  We  derive  a  finite  sample  version. 

The  infinite  sample  version  is  obtained  by  an  appropriate  passage  to  the 
limit.  To  estimating  equation  (2.15)  we  add  the  score  function  associated 
with  the  additional  observation  (y,x)  to  get 

Vi  -  *n  ♦  »V*> 


-13- 


I f  we  define 


Hn+1  *  E 


d*Xa 


then 


«„♦,-(££) 3/2  <xTx*~T>. 


where  the  last  equality  follows  from  (3.4) .  The  inverse  of  Hn+1  is 
(Belsley,  Kuh,  Welsch,  1981 ,  p.  6A) 


u-1  _  / 2+2X\  3/2  |  /vTvN-1  (XTX) -1 xTx(XTX)_ 1  | 

Vl  l<xx>  7;-X(Xtx,-i  xt  I  • 

A  finite  sample  version  of  the  influence  function  for  3  at  the  Gaussian 
distribution,  given  the  Xj  and  x,  is  defined  as  the  normalized  difference 
(see  Barnett  and  Lewis,  1978,  p.  137) 


ICn(y,x;6,*0  -  <r»l>  <£,  Sn+,  -  H~'  Sn> 


d*Xo 


(n+l)  3/2  |(XTX)-'  -  i&X'J  I  „V«8)  5XP(.  4 

Zx  1  1  +  x  (XTX)  1  xT  1  V  0Z(1+2X)/ 


(3.10) 


The  argument  (n+l)  in  (3.10)  is  the  normalizing  factor.  The  influence 
function  for  3  is  bounded  and  redescending  in  he  residual  y-xg  but  is  not 
bounded  in  x.  Thus  a  single  point  out  of  n+1  can  completely  change  the 
character  of  a  regression  estimate  6.  This  is  due  in  part  to  the  fact  that 
a  regression  model  represents  a  one-dimensional  summary  or  an  index  of 
a  much  more  complicated  phenomenon  in  a  higher  dimensional  Euclidean  space. 


L 
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Unless  Che  higher  dimensional  aspects  of  the  phenomenon  are  taken  into 
account,  we  run  the  risk,  even  in  a  critical  or  robust  setting,  of  pro¬ 
ducing  inappropriate  summary  models  for  multidimensional  phenomena.  Some 
approaches  to  bound  the  influence  of  points  far  out  in  factor  space  are 
discussed  in  Belsley  Kuh  and  Welsch  (1980,  Ch.  2)  and  Krasker  and  Welsch 
(1979)  where  further  references  may  be  found.  We  do  not  consider  the 
subject  in  any  further  depth  in  this  paper. 

Influence  functions  are,  of  course,  only  one  measure  of  qualitative 
robustness.  We  have  emphasized  the  influence  curve  here  because  it  is 
the  most  important  and  because  other  qualitative  measures  of  robustness 
such  as  gross  error  sensitivity,  local  shift  sensitivity,  rejection  point, 
and  breakdown  point  (see  Barnett  and  Lewis,  1978,  pp.  140-141)  can  be  found 
once  influence  functions  have  been  provided.  For  example,  the  rejection 
point  for  B  is  infinite  since  the  Euclidean  norm  ||  I Cn (y ,x;B ,N)  ||  will  not 
be  zero  for  any  (y,x)  if  the  residual  y-xfJ^O  or  x^O.  The  estimators  B  and 
a  have  breakdown  points  in  excess  of  k0%  for  nMO.  There  will  be  regions 
of  (y,x)  space  where  (y-xB)  will  dominate  the  behavior  of  ICn(y,x;B,N)  and 
others  where  x  wi 1 1  dominate.  Even  though  x  is  fixed,  it  may  be  advantageous 
to  regard  it  as  being  variable  in  order  to  force  the  influence  curve 
ICn(y,x;f5,N)  to  behave  less  radically  in  x,  especially  for  x  of  high 
d i mens i on . 

It  is  clear  that  |jim  B  *  §»  the  least  squares  estimator  of  B- 
By  L'Hospital's  rule  we  find  that 

1 im  a  *  a  *  n  l  (y .-x.BI  . 

A-h»  j-1  J  J 


-15- 


Finally,  there  is  a  small,  but  non-zero,  probability  that  the  estimator 

.2 

o  will  be  negative.  This  will  be  of  no  consequence  in  practice. 


4.  Examples 


We  now  illustrate  and  expand  on  the  results  of  the  previous 
sections  through  several  examples.  We  will  discuss  first  scalar  data, 
then  regression  models,  and  move  finally  to  the  analysis  of  designed 
experiments. 

Example  4. 1 .  The  sixteen  observations  in  Table  2  are  taken  from 

Quesenberry  and  David  (1961)  and  are  putatively  from  a  Gaussian  population 

N(jj,ct  ).  Even  without  formal  analysis  the  paired  observations  .74  and  1.09 

are  suspicious  with  respect  to  a  single  Gaussian  parent.  The  range  test 

of  Quesenberry  and  David  just  barely  rejects  the  observation  1.09  as  an 

outlier  at  significance  level  .05.  The  reason  for  this  is  that  the 

presence  of  the  three  rightmost  observations  produce  an  inflated  internal 
2 

estimate  of  o  .  For  our  analysis,  a  choice  of  the  value  X  is  required. 

If  our  choice  is  to  be  based  on  efficiency  we  refer  to  Table  1  to  obtain 
a  suitable  value.  The  estimates  for  A«2  are  provided  in  Table  2.  Also 
presented  in  Table  2  are  weights  Wj^  associated  with  each  observation. 

These  weights  are  determined  from  the  final  iteration  of  the  estimation 
algorithm  and  in  this  case  are 


(Yj-V)2 

o2(1+2A) 


)■ 


jX‘ 


Table  2 


,  _  ? 

Integrated  Distance  Estimates  Bg,  Og  and  Weights 
Wj^(xlO)  for  Selected  values  of  A 


Observation 

A~" 

X-2 

X-.5 

A-0 

1 

.32 

.625 

.62 

.58 

.44 

2 

.35 

.625 

.65 

.67 

.63 

3 

.37 

.625 

.66 

.72 

.75 

4 

.38 

.625 

.67 

.74 

.80 

5 

.39 

.625 

.67 

.76 

.85 

6 

.44 

.625 

.69 

.82 

.98 

7 

.45 

.625 

.69 

.82 

.98 

8 

.46 

.625 

.69 

.82 

.97 

9 

•  47 

.625 

.70 

.81 

.94 

10 

.48 

.625 

.70 

.81 

.91 

11 

.52 

.625 

.69 

.75 

.70 

12 

.53 

.625 

.69 

.73 

.64 

13 

.57 

.625 

.67 

.63 

.40 

14 

.74 

.625 

.53 

.17 

.  75(~2) 

15 

.74 

.625 

.53 

.17 

.75(-2) 

16 

1.09 

.625 

.15 

.34(-3) 

. 10(-9) 

W 

A 

- 

.27 

.07 

.  08  (—  1 ) 

Mean 

.52 

.48 

.45 

.44 

s.d. 

.19 

.16 

.11 

.10 
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The  magnitudes  of  these  weights  sene  to  rank  the  observations  according 
to  their  contribution  to  the  parameter  estimates  of  u  and  a  -  As  \ 
decreases,  the  role  of  observations  14,  15,  and  16  declines  dramatically. 
This  is  entirely  consistent  with  the  density  estimation  aspects  of  the 
procedure. 

In  this  scalar  case  we  have  From  (2.8)  we  determine  a 

score  function  for  0Q  through 


3  ?  r  8fw<V>*Mv> 

38^  Jn  “  *  4,1  l  J  - ae"-1 —  (fw(y-Vj>  -  fw(y)*fj(y))dy  . 

—00 


=  -  kirn 


r 


df„(y)*f(y)  1  n 


36r 


(  ^  I  (fjry,)  -  fw(y)*f(y))l  dy 


J-i 


w  'j 


y  ~  3 

since  f j  (y)  -  (2ira2)  *  exp^-  i  ( 2)  *  f(y),  say,  is  independent  of 
2 

j.  With  d*Xo  we  have  on  the  final  iteration  the  estimate  of  f  (y)*f(y), 


say  g.(y),  is  from  the  last  expression 


gA(y)  ■  ~  I  — exp/-  ■—(  -5— i-)  ) 

X  n  j-1  (2ttXo2)  *  \  2\\  0  /  / 


The  density  estimate  g^(y)  is  plotted  in  Figure  1  for  several  values  of  X. 

As  X  decreases,  g^(y)  "sees"  observations  14,  15,  16  as  different  from 

the  remainder  of  the  set  and  this  is  reflected  in  the  decreasing  values 

of  the  weights  associated  with  these  observations. 

As  X  decreases  from  infinity  to  zero,  the  estimated  mean  and  standard 

deviation  decrease  from  .52  and  .19  respectively  te  .44  and  .10  respectively. 

2 

The  estimates  for  y  and  o  remain  nearly  constant  as  x  is  taken  from  zero 
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to  -  and  thus  achieve  stability.  In  general,  if  the  data  are  Gaussian, 
the  mean  and  variance  do  not  fluctuate  appreciably  as  X  decreases. 

The  magnitude  of  the  weights  at  the  final  iteration  provide  an 
attractive  index  of  the  role  each  observation  is  playing  in  the  interplay 
between  assumed  model  and  the  observations.  Low  weighted  observations 
highlight  nonconsonance  of  the  observations  and  the  model.  This  dis¬ 
agreement  may  be  due  to  one  or  more  causes.  We  are  not  recommending  that 
low  weights  be  used  as  a  rejection  criterion  for  outliers  but  as  a  signal 
that  these  observations  merit  further  attention.  If  an  observation  has 
a  low  weight  it  is  a  potential  outlier;  or  indicates  a  potential  problem, 
the  nature  of  which  may  be  very  complex.  Only  discordant  outliers  will 
have  low  weights.  The  normalized  weights  w.^  are  presented  in  Table  2. 

The  weights  are  equally  informative  since  it  is  the  relative  magnitudes 
of  the  v.  and  w...  that  provides  an  indication  that  something  might  be 

J  A  J  A 

awry.  It  would  be  useful  to  have  a  benchmark  weight  for  easy  spotting  of 
observations  which  require  particular  attention  vis-a-vis  the  assumed 
model.  Somewhat  arbitrarily,  we  choose  as  a  benchmark  weight  the  value 
corresponding  to  an  observation  3  standard  deviations  from  the  mean. 

/  (yf*1)2  \ 

That  is,  we  set  y.  *  p  +  3c  in  v„  ■  exp  (-  i  — ■* - )  and  define 

1  \  / 


j* 


52(1+2A) 


w  »  exp(  -  4.5  (1+2A)'1) 
A  n 

l  v„ 


j-1 


J* 


At  A*2  c‘..!y  observation  16  is  less  than  W.  .  At  A-0  observations  14,  15. 


16  are  less  than  W^. 


The  role  of  the  v..  is  not  unlike  the  introduction  of  normal  prob- 

ability  paper  into  the  estimation  scheme  in  that,  as  X  decreases,  less 

and  less  weight  is  attached  to  observations  which  depart  from  linearity 

and  more  and  more  weight  is  attached  to  subsets  which  are  characterized 

by  linearity.  We  shall  not  always  be  able  to  ascribe  reasons  to  such 

n 

departures.  The  sum  £  v..  contains  useful  information  concerning  the 

j-1  J 

agreement  of  the  data  with  the  assumed  Gaussian  model.  The  expression 

n 

(2.14),  which  essentially  contains  J  v.^,  has  been  used  by  Bryant  and 

j-1  J 

Paulson  (1981)  in  a  goodness-of-f It  context. 

We  make  the  above  remarks  (which  apply  more  generally)  at  this  point 
because  it  is  easy  to  see  what  is  happening  with  a  simple  Gaussian  error 
model.  This  will  not  be  the  case  when  the  Gaussian  error  model  is 
combined  with  structural  models. 


Example  4.2.  The  data  for  this  example  were  given  by  Mickey,  Dunn 
and  Clark  (1967)  and  re-examined  by  Andrews  and  Pregibon  (1978).  In 
Table  3»  x  denotes  the  age  of  a  child  in  months  at  first  word  and  y 
denotes  the  Gesel 1  Adaptive  score.  The  model  being  fit  is  y  -  Bg  +  B^x. 
Only  pair  (17,121)  receives  a  low  weight  for  X-1  and  X»i  relative  to  the 
remainder  of  the  observations.  This  observation  does  not  have  much 
influence  on  the  estimates  of  8q  and  8^.  Pair  18,  (42,57),  does  since 
it  is  far  out  in  x-space.  The  distances  (2.7)  and  (2.8)  are  primarily 
concerned  with  residuals  Yj"xj6  as  an  index  of  the  two-dimensional  quant¬ 
ities  (yj ,Xj) ,  j  -  1,2,. ...n.  One  dimensional  models  cannot  capture  or 
summarize  all  the  information  available  in  multi-dimensional  data  unless 
the  model  assumptions  are  exactly  met.  An  appropriate  handling  of  this 
matter  seems  to  require  multivariate  methods.  Our  results  agree  with  those 


Table 


Age  at  First  Word,  Gesel 1  Adaptive  Score,  Parameter 
Estimates,  and  Weights  w.^(xio) 


Case 

Age  x 

Score  y 

X*CO 

X=1 

X=.! 

1 

15 

95 

.48 

.55 

•  58 

2 

26 

71 

.48 

.50 

.50 

3 

10 

83 

.48 

.36 

.30 

4 

9 

91 

.48 

.*»9 

.47 

5 

15 

106 

.48 

.47 

.46 

6 

20 

87 

.48 

.55 

.59 

7 

18 

93 

.48 

.54 

.56 

8 

11 

100 

.48 

.55 

•  58 

9 

8 

104 

.48 

.54 

.58 

10 

20 

94 

.48 

.50 

.50 

11 

7 

113 

.48 

.45 

12 

9 

96 

.48 

.54 

.57 

13 

10 

83 

.48 

.36 

.30 

14 

11 

84 

.48 

.41 

.36 

15 

11 

102 

.48 

.53 

.58 

16 

10 

100 

.48 

.55 

.59 

17 

12 

105 

.48 

.48 

.47 

18 

42 

57 

.48 

.55 

.59 

19 

17 

121 

.48 

.10 

.04 

20 

11 

86 

.48 

.44 

.41 

21 

10 

100 

.48 

.55 

•  59 

WX 

.48 

.11 

.05 

S0 

109.87 

110.53 

111.14 

B1 

-  1.13 

-  1.22 

-  1.25 

cr(adj  us  ted) 

11.02 

10.11 

9.87 
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of  Andrews  and  Pregibon. 


5.  Design  Problems 


For  reasons  of  brevity  we  shall  restrict  our  attention  to  relatively 
simple  situations  in  the  analysis  of  variance.  The  methods  we  shall  sub¬ 
sequently  present  can  be  used  for  much  more  complicated  situations.  The 
general  approach  that  should  be  taken  in  more  complicated  settings  will 
become  clear  since  it  closely  parallels  the  usual  least  squares  develop¬ 
ment. 

The  analysis  of  the  two-way  layout,  with  one  observation  per  cell 
when  multiple  outliers  are  present,  has  been  the  subject  of  particular 
attention  from  Daniel  (1978)  and  Gentleman  and  Wilk  (1975)-  It  is  note¬ 
worthy  that,  until  the  article  of  Daniel,  nearly  all  the  results  on  the 
two-way  analysis  of  variance  with  no  replication  and  outliers  have  been 
restricted  to  the  case  in  which  at  most  one  or  two  outliers  are  present  in 
the  layout.  Daniel  and  Gentleman  and  Wilk  obtain,  after  much  ingenious 
and  careful  analysis,  useful  criteria  for  the  identification  of  possibly 
spurious  observations. 

The  mathematical  model  for  the  two-way  layout  is 


yjU 


»+aj+6k+  ejk£ 


(5.1) 


where  represents  the  £th  response  in  the  (j,k)  cell,  l  -  1 ,2, . . .  .n^, 

j  ■  1,2,..., a,  k“  1,2, — ,b,  y  is  the  grand  mean,  cu  is  the  jth^  row 
effect,  8^  is  the  kth  column  effect,  and  is  a  normally  distributed 

error  with  zero  mean  and  variance  a  .  We  make  all  the  standard  assumptions 
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concern  fng  error  structure  In  the  two-way  tables. 
The  cell  j,k  has  sample  estimate 


*jk£(u)"  exP ( ' uy j j^) 


with  corresponding  expected  value 


tjU(u)  =  exp(iu(p+aj+6k)  -  i  a  u  ) , 


(5.2) 


(5.3) 


£  -  1,2 . njk-  Proceeding  along  the  lines  of  section  2  and  with 

referral  to  (2.14),  the  parametric  estimates  p,  a ^  ,  $k  are  obtained  by 
producing  a  set  of  score  functions  from  the  sum  of  integrated  squared 
residuals 

3  b  ^-1 4 

J  -  l  l  I  j  |$j|^(u)  ■  ^jlc£.(u)|2  exp(-du2)du 

i®1  k*  1  1 

**  —  OO 

-iii  [  (f>*  -  *  «p(-  i  (y^;u:Ye^. ) 

j  k  i  1  d  V2d+oZ/  2d  +  a 


‘1  • 


(5.4) 


We  obtain  the  score  functions  by  differentiating  (5.4)  with  respect  to 

2  2 
p,  a.,  0,  ,  a  and  then  setting  d  -  Aa  .  We  obtain 
J  * 


I  J  J  (»  ♦  «j  ♦  V^x  -I  I  |  VjWvjWii 


(5-5a) 


£  (w  +  “j  +  Sk)vjlce,A  “  ][  J  yjk£vjk£,A’  j  "  1,2,"‘,a  (5.5b) 
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?  I  (w  +  °ij  +  »k)vjk£>x  "  £  |  yjk£vjk£,A’  k  *  U 2. •••.!>.  (5.5c) 


An  estimate  of  the  variance  is  determined  implicitly  from  (5-51  and 


,  III  <VjW  ‘  "  '  °J  ‘  Bk>  Vi^-» 

7+2A 


]  i  l  3/2  > 


(5.6) 


where 


vjk£,A 


exp 


(y ^  -  o.  -  e, ) 


a  (1+2A) 


(5.7) 


The  rank  of  the  system  (5-5)  for  fixed  is  clearly  (a+b+1)  -  2  since 

summation  of  ( 5 - 5b)  over  j  gives  (5. 5a)  and  summation  of  (5 - 5b)  over  j 
is  identical  to  the  sum  of  (5.5c)  over  k.  From  (5.5a)  we  see  that  it  is 
natural  to  augment  the  system  (5-5)  with  the  two  side  conditions, 


f  i  |  °j  vju  *  »•  (5-8a) 

n  |  sk  vjw  “  °-  (5-8b) 


The  system  (5-5)  augmented  with  (5*6)  leads  to  the  statistically  and 
computationally  appealing  recursive  system 


j  k  1  Yjk£  Vjlcfc,X 

]  1 1  vJw-i 

j .  1>2>. 

k  I  vj u,\ 


,a-1 


(5.9a) 


(5.9b) 
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l  |  (yjW  '  "  '  s>i 


y  jk^.x 


]  \  * >**■•- 


k  »  1 ,2, . . .  ,b-1 


(5.9c) 


a-1  b  njk  a  b-1  nJk 

111  ajvj|c£  X  111  ®kVjk£,X 

j-1  k-1  £-1  J  JI0C,A  j-1  k-1  £-1  JM^A 


«  —  J  »  ™  ;  n  _  _  J  ■  ■  ; 

“a  b  njk ■  6b  a  nJk 


?  T 

k-1  tm 1  ak£»A 


(5.9d) 


j-1  £«  1  ->b^»A 


for  fixed  v^^  The  implementation  of  the  estimation  procedure  is 
straightforward.  Choose  initial  estimates  a,  y,  a.,  5.,  evaluate  v... 

J  K  J  K £.$  A 

for  the  initial  estimate.  Now  index  the  left-hand  sides  of  (5.9)  and 
(5.6),  by  (nH-l),  the  right-hand  sides  of  these  equations  by  m.  Iterate 
until  a  pre-determined  absolute  or  relative  tolerance  on  all  parameters 
is  met.  Once  v.^  ^  is  fixed,  all  operations  are  purely  linear.  Thus 
this  algorithm  is  computationally  efficient,  especially  for  large  systems. 
Local  solutions  to  the  system  (5-6)  -  (5-9)  can  be  encountered  when  X 
becomes  too  small,  but  difficulties  can  be  avoided  by  starting  with  least 
squares  (A=“°)  estimates  and  decreasing  X  gradually.  This  point  will  be 
subsequently  discussed. 

Suppose  now  that  we  wish  to  perform  a  critical  analysis  on  a  two- 
way  layout  with  a  single  covariate.  The  analysis  for  multiple  covariates 
will  be  similar.  The  model  is 


,xjw *  V 


(5.10) 


t  ■  1,2,...,nj^.  The  definition  of  terms  is  analogous  to  those  given 
earlier  in  this  section.  Just  as  in  the  above,  we  ultimately  find  the 


estimating  equations  to  be  implicitly  given  by 


l  l  (yjk£  '  v  '  yxjk£)vjk£,X 


a . 
J 


k  l 


£  I  VJW’X 

■  v  -  a,  -  yx..  Jv.. 


"»  J  *  ^  »  3“  1 


l  \  (yjkl  ~  M  '  “j  ~  rAjk£'vjk£,A 

]  \  vi«.' 


,  k  *  1 ,2, . . . ,b-1 


l  |  \  (yjk l  -  “  -  “j  -  ek)xJkl,XVjkt,X 


|  |  Xjk£vjk£,x 


,  jJJ  (yjkZ  •  11  -  °j  •  ek  -  yxjktlyJkt.x 


1+2  A 


f  l  l  <Vjkt,X  -  W)  3/2> 


(y  jk£  ’  p  -  aj  "  gk~yxj Uy 


'ik/  \  *  expr  *  9 

Jkt-A  1  a2(l+2A) 


along  with  the  two  constraints 


5H“j  v-iw'1 '  1 1\  ®k 


The  pattern  for  other  layouts  follows  the  usual  least  squares 


(5.11a) 


(5.11b) 


(5.11c) 


(5. lid) 


(5. He) 


(5. Ilf) 


approach.  The  structure  of  the  estimators  is  the  same  apart  from  the 
weighting  aspects. 
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6.  Design  Examples 

Example  6.1.  We  present  first  the  two-way  layout  with  n^-l 
analyzed  in  great  detail  by  Daniel  (1978).  Both  the  data  and  one  of  our 
analyses  are  summarized  in  Table  4.  The  first  tabular  entry  is  the  data 
quoted  by  Daniel,  the  percentage  of  men  aged  55“64  with  hearing  l6dbs  or 
more  above  audiometric  zero.  The  second  tabular  entry  is  the  final 
weight  w..  _  obtained  from  a  critical  analysis  with  X*.5. 

JK  ,  .  > 

Daniel  "localizes"  the  non-additive  observations,  computes  the 
required  variance  estimate  from  the  additive  portion  of  the  layout,  and 
identifies  the  observations  in  cells  (3*1).  (3*2),  (4,3),  (5*3),  and 
(6,3)  as  outliers.  As  Daniel  points  out,  such  elaborate  methods  are 
necessary  since  probability  plotting  and  one-at-a-time  methods  would 
prove  to  be  ineffective. 

In  analogy  to  the  benchmark  weight  in  example  4.1,  we  define  the 
weight  W  *  exp(-4.5(1+2X)  *)/  Y  v..  ..  With  X». 5  we  see  that  observa- 

A  j  k 

tions  (3.1),  (3,2),  (4,3),  (5,3),  (6,3)  and  (5,1)  have  extremely  low 
weights  w^  ^  associated  with  them.  The  first  five  Daniel  has  set  aside 
as  outliers.  While  Daniel  did  not  consider  the  disturbance  in  (5,1)  to 
be  of  sufficient  magnitude  to  designate  it  an  outlier,  it  is  the  sixth 
largest  deviation  in  absolute  value  found  by  him.  Daniel  appears  to 
be  aware  of  the  presence  of  this  disturbance  and  others  identified  via 
their  low  weights  in  Table  4  since  he  notes  at  one  point  that  the  variance 
estimate  from  the  unperturbed  portions  of  the  table  is  most  likely  too 
large  due  to  the  presence  of  some  smaller  disturbances.  The  adjusted  inte- 
grated  distance  estimate  of  c  is  (-jg-)  (7-29)  *  9.92.  Daniel's  estimate 


Percentage  of  Men  with  Given  Hearing  Level  for  Frequency  Level  and 
Occupation  k  (1st  tabular  entry).  Weights  w..  c(*10)  (2nd  tabular  entry)  and  Estimated  Effects 
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^Outliers  identified  by  Daniel 


2 

of  a  after  removal  of  the  five  outliers  is  10.40.  Even  with  the  very 
small  weights  associated  with  the  five  observations  explicitly  set  aside 

by  Daniel,  these  observations  still  have  a  non-zero  influence  on  the 

2  2 
estimate  of  o  .  It  would  seem  desirable  to  have  an  estimator  for  a 

whose  influence  function  redescends  to  zero. 

There  are  at  least  six  observations  which  are  not  consistent  with 
the  underlying  model.  The  reasons  for  the  inconsistencies  cannot  be 
completely  ascertained  from  the  analysis  but  there  are  some  comments 
which  seem  appropriate.  From  our  utilization  of  the  available  infor¬ 
mation  we  conclude  that  there  are  severe  distributional  difficulties 
associated  with  a  two-way  analysis  of  this  data.  Since  the  data  consist 
of  percentages  between  1.4  and  93-6  the  constant  variance  assumption  is 
violated.  Further,  the  proportions  are  not  based  on  equal  numbers  per 
cell  which  implies  that  an  arcsin  transformation  would  not  be  helpful; 
it  is  not,  the  results  remain  virtually  unchanged.  Although  we  have 
no  means  whereby  we  can  separate  effects  of  various  departures  from 
assumptions,  the  independence  assumption  of  the  two-way  model  is 
violated  because  the  last  row  of  the  table  is  a  linear  combination  of 
the  first  three;  the  deletion  of  the  seventh  row  does  not  modify  the 
results.  It  is  also  unlikely  that  the  occupation- frequency  cross¬ 
classification  exhausts  the  potential  sources  of  variability. 

The  direct  analogue  of  Figure  1  is  not  possible  in  this  case 
because  the  error  distribution  is  being  reconstructed  from  lack  of  fit 
components.  The  final  weights  v..  ..  or  w..  .  provide  a  different  but 

JKfA  JK f A 

equally  informative  summary  of  the  density  fitting. 


When  we  set  A-0  in  this  analysis  we  encounter  a  local  solution 
in  which  only  twelve  of  the  weights  are  non-zero.  This  solution  is 


also  dependent  on  starting  values.  This  phenomenon  is  partly  due  to  the 
fact  that  a  reliable  error  distribution  cannot  be  reconstructed  from 
this  data  at  such  an  intense  level  of  scrutinization  as  given  by  A*0. 

Example  6.2.  The  data  for  this  application  comprising  the  first 
and  second  tabular  entries  of  Table  5  are  taken  from  Rao  (1965,  p.  245) 
and  has  recently  been  analyzed  by  Hettmansperger  and  McKean  (1977)  in 
a  regression  context.  The  weekly  growth  rate  data  have  been  treated  as 
a  two-way  layout,  sex-food  combination  versus  pen,  since  sex-food  combi¬ 
nations  are  also  of  primary  concern.  The  relevant  linear  model  is  given 
by  (5*10)  with  n^-l.  y^  is  the  weekly  growth  rate,  x^  is  the  initial 
weight  associated  with  food-sex  combination  j and  pen  k,  cu  is  the  effect  of 
food-sex  combination  on  growth  rate,  g^  is  the  effect  of  pen  k  on  growth 
rate,  and  y  is  the  rate  of  change  of  response  with  initial  weight.  The 
results  of  the  analysis  with  A-.5  are  also  given  in  Table  5.  The  analysis 
is  conducted  in  accordance  with  equations  (5*11).  The  unadjusted  estimate 
of  a  is  a  “.067;  the  adjusted  estimate  is  .067  (y^)  *  .106.  We  determine 
that  the  benchmark  weight  W  _  ■  .0013.  The  observations  in  cells  (2, AH) 
and  (2,BK)  have  associated  weights  less  than  W  c  and  deserve  special  atten- 

•  J 

tion.  Compare  W  _  with  the  boldface  entries  of  Table  5.  Observations  in 
•  -> 

cells  (3,CG) ,  (4,BG),  and  (5, AG)  also  have  relatively  low  associated  weights 
and  are  all  about  2.4  (adjusted)  a  standard  deviations  removed  in  the  left 
tail.  These  also  deserve  special  attention.  These  five  observations  exert 
a  substantial  influence  on  the  estimates  for  the  effect  of  sex-food  combi¬ 
nation.  The  superiority  of  sex-food  combination  AG  and  of  food  A  is  much 
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more  apparent  under  our  analysis  than  under  ordinary  least  squares  analysis. 
Hettmansperger  and  McKean  (1977)  also  determined  that  cells  (2, AH)  and 
(2,BH)  deserved  special  attention. 

Examp 1 e  6.3.  This  3*4  factorial  experiment  with  four  replicates  per 
cell  was  first  discussed  at  length  by  Box  and  Cox  (1964).  Each  of  twelve 
poison- treatment  combinations  was  administered  to  a  group  of  four  animals;  the 
survival  times  are  recorded  in  Table  6.  Brown  (1975)  subsequently  used  these 
data  to  demonstrate  a  stepwise  procedure  for  detecting  the  cells  which  give 
rise  to  a  significant  interaction  in  the  analysis  of  variance.  Having 
formed  the  mean  survival  time  for  each  of  the  twelve  cells.  Brown  deter¬ 
mined  that  the  interaction  between  poison  and  treatment  was  localized  in 
the  ( 3 » B)  poison-treatment  cell.  Brown  suggests  the  analysis  that  results 
from  the  replacement  of  the  mean  value  in  cell  (3,B)  with  the  traditional 
missing  value  estimate  as  an  acceptable  substitute  for  performing  an 
analysis  of  variance  on  the  reciprocal  of  survival  time  -  as  originally 
suggested  by  Box  and  Cox. 

By  forming  the  cell  means  and  focusing  his  analysis  exclusively  on 
them.  Brown  failed  to  identify  the  true  character  of  the  data.  Table  6 
displays  the  OLS  residuals  and  the  weights  Wj^  ^  for  the  survival  times. 

The  twelve  groups  of  residuals  exhibit  a  sign  arrangement  with  probability 
of  approximately  .21  from  a  x2  goodness-of-f i t  test  of  this  or  a  more 
extreme  configuration.  The  weights  Wj^  5  in  comparison  with  W  j».0021 
obtained  from  fitting  the  usual  additive  model  indicate  that  observations 
(1,B,2),  (2,B ,1),  (2,B,4),  and  (2,0,2)  may  not  be  consistent  with  the 
assumptions  or  that  one  or  more  of  the  assumptions  may  not  be  consistent 
with  the  data.  These  four  observations  correspond  to  the  four  largest 


Table  6 

Survival  Times  (1st  columnar  entry), OLS  Residuals  (2nd  columnar  entry). 
Integrated  Distance  Weights  (*10)  (3rd  columnar  entry) 
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survival  times.  If  they  fell  in  one  cell  it  would  strongly  suggest  the 
presence  of  an  interaction.  However,  the  validity  of  the  normality 
assumption  is  considerably  more  questionable,  the  right  tail  of  the 
survival  distribution  being  too  thick  to  accommodate  the  normality  assump¬ 
tion.  A  plot  of  the  weights  Wj^  ^  against  the  X-.5  residuals  shows  this 
nicely.  A  transformation  of  the  data  seems  to  be  in  order.  Note  that 
some  of  the  largest  (in  absolute  value)  residuals  do  not  receive  the 
lowest  weights.  Thus  these  weights  are  not  just  a  transformation  of  the 
residuals  from  ordinary  least  squares. 

An  analysis  with  A*. 5  of  the  reciprocals  of  the  survival  times 
produces  a  set  of  weights  wj^  say,  which  are  much  better  behaved. 

Only  observation  (2, A, 4)  is  near  W*  .  Thus  the  transformation  was 

•  J 

appropriate. 

Example  6.4.  We  now  consider  a  set  of  data  from  an  unreplicated 
T?  experiment  attributed  (see  Barnett  and  Lewis,  1978,  pp.  244-246  for 
additional  background)  to  C.  Daniel.  The  first  two  columns  of  Table  7  der 
scribe  the  treatment  combinations,  and  the  corresponding  yields.  The  third  column 
provides  the  X-*  (OLS)  residuals;  the  fourth  provides  the  X-.5  residuals; 
the  fifth  provides  the  A-.5  weights  determined  from  straightforward 
extension  of  the  results  In  section  5.  A  single  pass  with  the  density  -  or 
characteristic  function  -  based  procedure  indicates  that  the  yield  of 
treatment  combination  A  stands  out  from  the  other  yields.  Thus  the  para¬ 
meter  estimates  are  not  stable  under  increasing  criticism  as  embodied  in 
decreasing  X  and  treatment  combination  A  is  highlighted  as  the  combination 
which  deserves  special  attention.  In  this  case  the  treatment  yield  is 
discordant.  We  thus  see  that  the  critical  procedure  based  on  integrated 


Table  7 

Summary  of  Analysis  of  a  2^  Factorial  Experiment 
Main  Effects  Model 


Treatment 

Combination 

Yield 

A*® 

Res i dua 1 s 

A-.5 

Res i dua 1 s 

X-.5 

Weights 

(1) 

121 

-15.25 

0.39 

0.16 

A 

145 

32.00 

64.21 

0.271- 

B 

150 

0.50 

0.39 

0.16 

AB 

109 

-17.25 

-0.78 

0.16 

C 

160 

4.00 

3.77 

0.14 

AC 

112 

-20.75 

-4.41 

0. 14 

BC 

180 

10.75 

-5.23 

0.13 

ABC 

152 

6.00 

6.60 

0.11 
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distances  can  be  useful  in  initial  detection  of  discordant  outliers. 

7.  Appropriate  Values  of  X 

We  regard  the  integrated  distance  procedure  as  both  an  exploratory 
procedure  as  well  as  a  robust  procedure.  As  an  exploratory  procedure  it 
can  be  made  increasingly  critical  in  nature  by  decreasing  X.  However,  X 
cannot  be  decreased  indefinitely  since  then  the  procedure  will  begin  to 
cluster  groups  of  observations  and  ultimately  each  observation  will  be 
regarded  as  a  separate  cluster.  At  some  point  before  this  latter  sta_».  is 
reached  the  procedure  may  no  longer  be  useful  because  of  the  possibility 
of  multiple  solutions  centered  around  various  subsets  of  the  data. 

If  the  procedure  is  to  be  used  in  an  exploratory  fashion,  then  X 

should  be  varied  to  determine  the  response  of  the  parameter  estimates  and  the 

observation  weights  to  this  variation.  We  would  start  with  X=“>  and  gradually 

reduce  the  value  of  X.  We  typically  next  take  X=1 ,  and  X=±.  Occasionally 

we  take  X*0  when  a  large  data  set  is  under  study.  If  the  parameter 

estimates  remain  constant  so  that  the  rate  of  change  of  these  estimates 

with  X  remains  near  zero,  then  the  data  and  the  structural  and  error  model 

will  be  judged  to  be  internally  consistent.  If  the  parameters  begin  to 

change  with  decrease  in  X,  there  will  be  a  corresponding  decrease  in  the 

weights  v.  or  v..  for  some  j  or  some  pair  (j,k),  for  example.  These 
J  A  J  X 

low  weights  indicate  the  most  sens! ti ve-to-cri ticism- interface  between  the 
data  and  the  error  and  structural  model  and  where  attention  should  be 
focused  in  evolving  a  decision  as  to  whether  the  data  or  the  model  assump¬ 
tions  are  at  variance  with  internal  consistency.  Thus  we  view  the  explora- 


1 
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tory  side  of  the  procedure  as  being  useful  in  mode)  evolution  as  well  as 
in  isolating  trouble  spots. 

If  we  regard  the  integrated  distance  procedure  as  a  robust  procedure, 
then  X  should  preferably  be  held  fixed.  Efficiency  considerations  might 
partially  dictate  a  particular  value  of  X.  On  the  other  hand,  the  procedure 
becomes  increasingly  qualitatively  robust  as  X  decreases  and  the  degree 
of  robustness  desired  may  partially  dictate  a  value  of  X.  The  robustness 
properties  are  due  to  the  connection  of  the  procedure  with  density  estimation. 
The  parameter  X  effectively  determines  the  window  width  in  parametric 
density  estimation. 

Finally,  we  note  that  it  is  possible  to  have  one  value  of  the  para* 
meter  X  for  the  location  parameter  component  and  another  for  the  scale 
parameter  component  of  the  procedure.  Such  a  situation  would  be  useful 
when  scale  is  a  nuisance  parameter.  For  example,  in  (2.17a)  we  would  use 

values  v.  and  in  (2.17b)  we  would  use  values  v... 

JA  J  A 

We  have  presented  the  results  for  X«*i  in  our  analyses  because  they 
would  generally  be  the  final  step  in  determining  the  sensitivity  of  model 
parameter  estimates  to  changes  in  X  and  especially  for  the  sake  of  brevity. 

The  extent  to  which  X  may  be  decreased  also  depends  on  the  experimental 
design.  If,  for  example,  we  were  to  critically  examine  data  from  a  Latin 
Square  design,  we  might  not  be  able  to  reduce  X  to  i  without  driving  some 
of  the  weights  Vj^  ^  to  zero  since,  from  a  density  estimation  point  of 
view,  only  a  relatively  small  number  of  observations  are  available  to 
reconstruct  the  error  density. 
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